Performance Evaluation of Two Arabic OCR Products

نویسندگان

  • Tapas Kanungo
  • Gregory A. Marton
  • Osama Bulbul
چکیده

Numerous Optical Character Recognition (OCR) companies claim that their products have near-perfect recognition accuracy (close to 99.9%). In practice, however, these accuracy rates are rarely achieved. Most systems break down when the input document images are highly degraded, such as scanned images of carbon-copy documents, documents printed on low-quality paper, and documents that are n-th generation photocopies. Besides, the end user cannot compare the relative performances of the products because the various accuracy results are not reported on the same dataset. In this article we report our evaluation results for two popular Arabic OCR products: i) Sakhr OCR and ii) OmniPage for Arabic. In our evaluation we establish that the Sakhr OCR product has 15.47% lower page error rate relative to the OmniPage page error rate. The absolute page accuracy rates for Sakhr and Omnipage are 90.33% and 86.89% respectively. Our evaluation was performed using the SAIC Arabic image dataset, and we used only those pages for which both OCR systems produced output. A scatter-plot of the page accuracy-rate pairs reveals that Sakhr in general performs better on low-accuracy (degraded) pages. The scatter-plot visualization technique allows an algorithm developer to easily detect and analyze outliers in the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paired Model Evaluation of OCR

Characterizing the performance of Optical Character Recognition (OCR) systems is crucial for monitoring technical progress, predicting OCR performance, providing scientiic explanations for system behavior and identifying open problems. While research has been done in the past to compare the performances of OCR systems, all methods assume that the accuracies achieved on individual documents in a...

متن کامل

OmniPage vs. Sakhr: paired model evaluation of two Arabic OCR products

Characterizing the performance of Optical Character Recognition (OCR) systems is crucial for monitoring technical progress, predicting OCR performance, providing scienti c explanations for the system behavior and identifying open problems. While research has been done in the past to compare performances of two or more OCR systems, all assume that the accuracies achieved on individual documents ...

متن کامل

Paired Model Evaluation of OCR Algorithms

Characterizing the performance of Optical Character Recognition (OCR) systems is crucial for monitoring technical progress, predicting OCR performance, providing scienti c explanations for system behavior and identifying open problems. While research has been done in the past to compare the performances of OCR systems, all methods assume that the accuracies achieved on individual documents in a...

متن کامل

The Bible, truth, and multilingual OCR evaluation

Multilingual OCR has emerged as an important information technology, thanks to the increasing need for crosslanguage information access. While many research groups and companies have developed OCR algorithms for various languages, it is di cult to compare the performance of these OCR algorithms across languages. This di culty arises because most evaluation methodologies rely on the use of a doc...

متن کامل

The Bible , Truth , and Multilingual OCR

Multilingual OCR has emerged as an important information technology, thanks to the increasing need for cross-language information access. While many research groups and companies have developed OCR algorithms for various languages, it is diicult to compare the performance of these OCR algorithms across languages. This diiculty arises because most evaluation methodologies rely on the use of a do...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998